Anime Character Ageing

Adham Hegazy
Ahmed Wael

Problem Statement

The project aims to create a model that ages teen anime characters into seniors, and vice versa. In order to do so, we will use the CycleGAN, which is utilized in similar problems of other domains. It is notable to say that based on our research, this project is the first to tackle this specific problem in the anime characters domain.

Dataset

We did not find a suitable labeled dataset which can be utilized. What we did find is a scraper by Lukeperson for the AnimeCharacters website, however we did heavy modifications to improve efficiency.

Moving forward, the images needed to pass by a face-cropper first to be usable by the model, since it was only interested in the face. For this issue, the croppers of Nagadomi and Katerina was utilized.

After these steps, the dataset that we had needed to be increased to make sure that the model was not overfitting. So multiple augmentation techniques where applied, such as flipping the images horizontally, and adjusting brightness and contrast.

Scrapped dataset example:

Cropped dataset example:

Input/Output Examples

As mentioned, our problem statement is ageing and de-aging of anime characters. To achieve that, the input image should be a teen anime character, then the output should be an aged version of it. This is reversed when deaging as it should have a senior anime character as its input and it outputs a deaged version of it.

State of the art

Since we found no existing models for aging and deaging anime characters, there are no state of the art models. However, we used an architecture that solved the same problem in humans, which is the CyclicGAN as it also came with the benefit of not requiring paired dataset.

Orignial Model from Literature

The general CycleGan Architecture

Our specific GAN: Generator (Blue) and Discriminator (Pink)

Proposed Updates

Add all the model updates you made here, need as many images as you wish

Update #1: Data (Scraping, Cropping & Augmenting

As mentioned before, we did not have the dataset that we worked on. This dataset in itself is an update, since we needed to modify an existing scrapper, then crop it using two croppers, then augment it.

This can be considered in itself an independent update, as it can prove useful for future models which require large datasets of anime characters' faces.

Update #2: Model Variations

In an attempt to find a model that works best for the domain at hand, we did a variety of hyperparameter tuning such as:

Image Size: 256x256, 128x128, 64x64
Generator Middle Conv2D Activations: ReLU, Tanh
Generator Output Conv2D Activation: Tanh, ReLU
Cycle-consistency Loss: 0.1, 0.05, 0.01
Identity Loss: 0.1, 0.05
Base Filter Size: 64, 128
Batch Size: 1, 8, 16, 32
Discriminator Trainability in combined GAN: False, True

where the bold ones proved to have the highest potential.

Update #3: Integrating Quantitative Metric

The original base model did not have a quantitative metric that evaluated the generated images. Therefore we had to intigrate the chosen quantitative metric in the code, which is the frechet inception distance.
The FID was introduced by Heusel et al. in 2017 and it uses “Inception v3 - 2015” pretrained model.
This metric aims to measure similarity between feature vectors of 2 groups of images, and is considered one of the most popular quantitative metrics for GANs.

Results

Add your results here, add graphs and images to illustrate it. Compare your results to the original model and state of the art

Technical report

Here you will detail the details related to training, for example:

Programming framework: Keras
Training hardware: Colab and Local Machine (Macbook Pro 2019 - Intel i9 32 GB RAM - 4 GB dedicated graphics)
Number of epochs: For every variation, the number of epochs ranged from 3 to 60 epochs. This was obviously highly dependent on the number of parameters, which was in turn mainly dependent on the image size and the number of filters. We settled on a model that was trained for 19 epochs.
Training time: It is very difficult to track, but Ahmed would train the model for 16-24 hours on Google Colab (using 2-3 different accounts with 8 hours of GPU each), and Adham would train on his local machine overnight and throughout the day
Time per epoch: highly variating from around 8 minutes to 3 hours.
Any other important detail or difficulties: Being the pioneers of a specific problem mandates that you need to dedicate a lot of time on secondary matters before even running the model, so we had less opportunities for hyperparameter tunning.

Conclusion & Future Work

We laid the foundation for the problem of anime character ageing, and we can confirm that the initial model is – as of now – one of the two best performing models. We have also noticed, visually, that some models reconstructed the original image perfectly, but failed to apply proper ageing. In addition, while quantitative metrics are more time-efficient, they are not always the most reliable. We realized this from our work, since models with different FIDs appeared to have the same visual success level. Therefore, a balance should be stricken between both.

The model itself is the first step for Anime character ageing and de-aging. We suggest varying the cyclic-consistency loss further to give more weight to the Generator loss for better ageing and deageing. Additionally, the dataset can be useful for anime-related models in general. Finally, integrating the FID in CycleGAN can be useful for other use cases with similar architecture.